5 research outputs found
Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and Caching
Training and inference with graph neural networks (GNNs) on massive graphs
has been actively studied since the inception of GNNs, owing to the widespread
use and success of GNNs in applications such as recommendation systems and
financial forensics. This paper is concerned with minibatch training and
inference with GNNs that employ node-wise sampling in distributed settings,
where the necessary partitioning of vertex features across distributed storage
causes feature communication to become a major bottleneck that hampers
scalability. To significantly reduce the communication volume without
compromising prediction accuracy, we propose a policy for caching data
associated with frequently accessed vertices in remote partitions. The proposed
policy is based on an analysis of vertex-wise inclusion probabilities (VIP)
during multi-hop neighborhood sampling, which may expand the neighborhood far
beyond the partition boundaries of the graph. VIP analysis not only enables the
elimination of the communication bottleneck, but it also offers a means to
organize in-memory data by prioritizing GPU storage for the most frequently
accessed vertex features. We present SALIENT++, which extends the prior
state-of-the-art SALIENT system to work with partitioned feature data and
leverages the VIP-driven caching policy. SALIENT++ retains the local training
efficiency and scalability of SALIENT by using a deep pipeline and drastically
reducing communication volume while consuming only a fraction of the storage
required by SALIENT. We provide experimental results with the Open Graph
Benchmark data sets and demonstrate that training a 3-layer GraphSAGE model
with SALIENT++ on 8 single-GPU machines is 7.1 faster than with SALIENT on 1
single-GPU machine, and 12.7 faster than with DistDGL on 8 single-GPU machines.Comment: MLSys 2023. Code is available at
https://github.com/MITIBMxGraph/SALIENT_plusplu